Two Questions about Data-Oriented Parsing
نویسنده
چکیده
In this paper I present ongoing work on the data-oriented parsing (DOP) model. In previous work, DOP was tested on a cleaned-up set of analyzed part-of-speech strings from the Penn Treebank, achieving excellent test results. This left, however, two important questions unanswered: (1) how does DOP perform if tested on unedited data, and (2) how can DOP be used for parsing word strings that contain unknown words? This paper addresses these questions. We show that parse results on unedited data are worse than on cleaned-up data, although still very competitive if compared to other models. As to the parsing of word strings, we show that the hardness of the problem does not so much depend on unknown words, but on previously unseen lexical categories of known words. We give a novel method for parsing these words by estimating the probabilities of unknown subtrees. The method is of general interest since it shows that good performance can be obtained without the use of a part-ofspeech tagger. To the best of our knowledge, our method outperforms other statistical parsers tested on Penn Treebank word strings.
منابع مشابه
cm p - lg / 9 60 60 22 17 J un 1 99 6 Two Questions about Data - Oriented Parsing
In this paper I present ongoing work on the data-oriented parsing (DOP) model. In previous work, DOP was tested on a cleaned-up set of analyzed part-of-speech strings from the Penn Treebank, achieving excellent test results. This left, however, two important questions unanswered: (1) how does DOP perform if tested on unedited data, and (2) how can DOP be used for parsing word strings that conta...
متن کاملA View of Parsing
The questions before this panel presuppose a distinction between parsing and interpretation. There are two other simple and obvious distinctions that I think are necessary for a reasonable discussion of the issues. First, we must clearly distinguish between the static specification of a process and its dynamic execution. Second, we must clearly distinguish two purposes that a natural language p...
متن کاملA U - DOP approach to modeling language acquisition
In linguistics, there is a debate between empiricists and nativists: the former believe that language is acquired from experience, the latter that there is an innate component for language. The main arguments adduced by nativists are Arguments from Poverty of Stimulus. It is claimed that children acquire certain phenomena, which they cannot learn on the basis of experience alone —and therefore,...
متن کاملSemantic Case Analysis of Informal Requirements
Case grammars provide a natural basis for an object-oriented analysis of software requirements. Two important areas of object-oriented requirements analysis are addressed: (1) identiication of entities which should be modeled as objects in the software design; and (2) detection of inconsistencies in the requirements documents. Available heuristics to identify these entities are based on intuiti...
متن کاملPolynomial Tree Substitution Grammars: an efficient framework for Data-Oriented Parsing
Finding the most probable parse tree in the framework of Data-Oriented Parsing (DOP), a Stochastic Tree Substitution Parsing scheme developed by R. Bod (Bod 92), has proven to be NP-hard in the most general case (Sima’an 96a). However, introducing some a priori restrictions on the choice of the elementary trees (i.e. grammar rules) leads to interesting DOP instances with polynomial time-complex...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره cmp-lg/9606022 شماره
صفحات -
تاریخ انتشار 1996